Global, regional, and cryptic population structure in a high gene-flow transatlantic fish

Lumpfish (Cyclopterus lumpus) is a transatlantic marine fish displaying large population sizes and a high potential for dispersal and gene-flow. These features are expected to result in weak population structure. Here, we investigated population genetic structure of lumpfish throughout its natural distribution in the North Atlantic using two approaches: I) 4,393 genome wide SNPs and 95 individuals from 10 locations, and II) 139 discriminatory SNPs and 1,669 individuals from 40 locations. Both approaches identified extensive population genetic structuring with a major split between the East and West Atlantic and a distinct Baltic Sea population, as well as further differentiation of lumpfish from the English Channel, Iceland, and Greenland. The discriminatory loci displayed ~2–5 times higher divergence than the genome wide approach, revealing further evidence of local population substructures. Lumpfish from Isfjorden in Svalbard were highly distinct but resembled most fish from Greenland. The Kattegat area in the Baltic transition zone, formed a previously undescribed distinct genetic group. Also, further subdivision was detected within North America, Iceland, West Greenland, Barents Sea, and Norway. Although lumpfish have considerable potential for dispersal and gene-flow, the observed high levels of population structuring throughout the Atlantic suggests that this species may have a natal homing behavior and local populations with adaptive differences. This fine-scale population structure calls for consideration when defining management units for exploitation of lumpfish stocks and in decisions related to sourcing and moving lumpfish for cleaner fish use in salmonid aquaculture.


Kevin Glover
General guidance is provided below.
Consult the submission guidelines for detailed instructions. Make sure that all Ethical review and approval was not required for the animal study because This is a common commercial species and the fish used in this study were not killed specifically to enable this study. All samples in this study were either taken from fish which were already sampled for other studies or taken from already dead fish caught commercially or for other surveys. In each country involved, national laws and regulations were strictly followed.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
information entered here is included in the Methods section of the manuscript.  During this migration, they will make extensive vertical movements through the water 90 column, and show a greater association with the sea bed [36]. Males arrive at the coast earlier 91 was recalibrated (VQSR) using site identity across technical replicates as a training set. Sites 153 with more than 10% missing data and a fraction of heterozygotes above 0.5 (possible lumped 154 paralogs) were removed, leaving a total of 7,301 SNPs. After adding a minor allele frequency 155 (MAF) filter of > 0.1 in the total dataset, and a minor allele count of at least two, a final 156 dataset of 4,393 SNPs, was retrieved. This genome-wide dataset was used to investigate 157 general global population genetic patterns of lumpfish, as well as to select a subset of 158 discriminatory SNPs. 159

Selection of SNP markers and genotyping 160
To find regionally discriminatory markers, non-linked (i.e not in linkage disequilibrium) 161  Table 2a), and many of 184 them were located within (N=64) or in the immediate vicinity (less than 2kb away; N=16) of 185 known/annotated gene sequences. 186

Population genetic analysis 187
The genome-wide, sequence-derived dataset consisting of 95 fish from 10 locations and 188 4,393 SNPs, is hereafter referred to as the "genome-wide dataset". The genotype data from 189 the final 139 SNP panel is referred to as the "targeted dataset". All analyses were conducted 190 with both datasets unless otherwise stated. Besides analyzing samples, several analyses were 191 carried out using larger pooled groups based on geographic regions (as given in Table 1). 192 When applicable, grouping was made differently based on results received from the upstream 193 analyses. 194 correlated allele frequencies and with location information given a priori [68]. A total of 50 220 000 MCMC (Markov Chain Monte Carlo) repetitions were used in each run after 20 000 221 repeats were discarded as burn-in. K was set from 1 to 10 or 12 (depending if whole or partial 222 dataset was used, see below), and the number of iterations was set to 5. To determine the 223 optimal K, bar plots were inspected visually and runs analyzed with the StructureSelector 224 software [69]. The software summarizes results as the optimal Ln Pr(X|K) given by the 225 LargeKGreedy algorithm and 2,000 repeats. 230

Genetic variation and its division
As higher levels of structure can effectively mask lower levels of structure [71] a hierarchical 231 approach was employed. The above-mentioned clustering analyses were therefore performed 232 first for the whole dataset, followed by a separate analysis for sampling sites that were 233 geographically close and showed similar admixture profiles. For each of the investigated 234 regions, the most informative loci behind the observed structure were determined with the 235 help of PCA loadings. The threshold for being informative was set to 0.1. 236

Population graphs 237
Genetic structure for the targeted SNP data was also analyzed with Population Graphs, a 238 graph-theoretic approach that uses conditional genetic distance Results 300

Genetic variation and its division 301
Expected heterozygosity over the 4,393 SNPs varied between 0.124 and 0.151 across samples 302 (Supplementary Table 3 FST was 0.094 (SD ± 0.057) for the genome-wide dataset, and slightly higher, 0.115 (SD ± 311 0.007) for the targeted dataset. Three major groups were clearly identified: The highly-312 differentiated West and East Atlantic (FST (genome-wide) 0.087-0.154), and the Baltic Sea (FST 313 (genome-wide) >0.08 and >0.16 against West and East Atlantic, respectively). Estimates of 314 genetic divergence among these three groups were consistently higher, about two to five-fold, 315 for the targeted dataset (Table 2, Supplementary Table 4b). 316     Table 1. 332

333
In the Arctic, Isfjorden stood out as highly differentiated from the samples collected offshore 334 from Svalbard and the Barents Sea area, as well as from northern Norway and Iceland 335 (Supplementary Table 4a genome-wide and targeted datasets, and explained 11.1% and 10.9% of the total variation, 347 respectively. The second axis separated the Baltic Sea samples from the rest and explained 348 3.45% and 4.60%, respectively. Using the DAPC approach and regional partitioning of the 349 data showed similar results as the PCA but with clearer clustering (Supplementary File 1). The results from STRUCTURE were consistent with those from the PCA/DAPC approaches 361 but provided additional insights. In the genome-wide dataset, the division between western 362 and eastern Atlantic was clear, but many individuals also shared signs of admixture ( Figure  363 3). In the East Atlantic, all lumpfish shared at least ~ 50% of their genetic information. K=7 364 was deemed as the best fit for the data (Supplementary   of samples, which due to their genetic and geographic closeness were analyzed together. 396 Each pie chart shows one sample's assignment to the selected number of clusters (K) 397 averaged across individuals. Note that some of the samples are included in multiple 398 assignments but that Isfjorden and the English Channel samples were not included in any 399 regional structure analysis due to their uniqueness.   significantly associated (p < 0.001) with the observed patterns of genetic structure. Together 437 they explained 55.5% of the total variation ( Figure 6). Partial analysis removing the effect of 438 geography still showed a significant association (R 2 = 0.086, p < 0.001), albeit much weaker. 439 Outliers were defined as being at least 2.5 standard deviations away from the mean SNP 440 loadings along the significant axes. Only a single SNP (Lump-165) passed that criterion. The 441 same SNP was denoted as a putative outlier in the LFMM analysis with both datasets 442 (Supplementary Table 7b; Supplementary Table 8). 443

Regional population structures 486
We detected different degrees of sub-structuring within the large regional groups. Some 487 finer-scale population patterns for lumpfish have already been described in previous studies 488 where isolation-by-distance was detected in West Greenland [44], and also suggested in 489 Norway where Averøy in mid-Norway was found to be different from other Norwegian 490 samples [45]. Here we found further structuring within regions that have not been previously 491 described. When we examined the genetic results jointly with geographic location and 492 information on life history stages (Table 1), we found that samples with only breeding and 493 juvenile lumpfish formed regional and clear-cut groups, whereas migrating fish, mid-sea 494 samples and samples known to contain both juveniles and adults were more admixed. Genetic 495 divergence of regional breeding populations is thus in line with observations which suggest 496 natal homing in lumpfish [41,42]. 497

West Atlantic 498
In the western Atlantic, we found significant genetic differentiation among samples collected 499 from Greenland, Canada and USA, as well as a north-south division along Greenland's west-500 coast, as described previously [43,44]. Furthermore, the targeted SNP panel produced 501 distinct clustering patterns for the groups (Figure 4; Supplementary Table 5), suggesting good 502 regional resolution, and could likely be used to e.g., detect migrants. Indeed, two individuals 503 collected in Maine, USA, had high admixture proportions (~50 and ~80%) and resembled the 504 Canadian sample, suggesting that they could be migrants or hybrids between the two 505 populations. 506

Isfjorden 507
Despite being geographically closer to several samples from the East Atlantic, the sample 508 from Isfjorden in Svalbard clustered closely with samples from western Greenland. As this 509 sample was highly divergent from off-shore Svalbard samples, and we did not detect any 510 individuals with similar genetic make-up in any nearby samples, this fjord population may 511 represent a regionally divergent gene pool, perhaps specially adapted to its Arctic 512 environment [92]. Svalbard is surrounded by strong Arctic currents that run north and west 513 towards the Arctic Ocean and the Greenland Sea (Supplementary Figure 6). However, the 514 closest sample in Greenland in this study was over 3,000 kilometers away (along the shortest 515 ice-free waterway). It is unknown how far north lumpfish reach, but historical Arctic 516 connectivity could explain the observed pattern. However, pairwise FST estimates and 517 Popgraph analysis show a higher resemblance between Isfjorden and southern Greenland 518 compared to northern Greenland, suggesting a more recent connection. Either Isfjorden is an 519 isolated relic of past gene-flow or a manifestation of still on-going connectivity. Strong cold 520 polar currents have also been proposed as the barrier separating the West and East Atlantic 521 [43], Greenland and Canada, as well as Greenland and Iceland lumpfish populations [44]. 522 These divisions align with our results, which showed significant divergence and no likely 523 migrants across these borders. 524

Iceland 525
We found a weak but statistically significant east-west subdivision across the Westfjords in 526 Iceland. Breiðafjörður appeared most distinct, whereas fish from Sandgerði and Bolungarvik 527 were more admixed (Figure 4). This pattern was driven by only three outlier loci 528 (Supplementary Table 2c), all of them in chromosome 13, and thus possibly suggesting a 529 structural variant or a genomic region for local selection. A region in this chromosome, close 530 to the outliers reported here, was recently discovered to be highly associated with sex [93], 531 indicating that the observed structure could be related to differences between males and 532 females. Repeated sampling of fish of opposite sex from the same location, Skagastrønd 533 (SKA1_IS and SKA2_IS), did not show any significant differentiation, however (FST = 534 0.0007, NS). This suggests that sex is unlikely to bias the results. 535

English Channel 536
In the English Channel, lumpfish were genetically differentiated from all other samples 537 (Table 2). This pattern was also observed using microsatellites in a previous study by 538 Whittaker et al. (2018), which used the same individuals as in this study. As suggested 539 therein, we agree that a plausible mechanism behind the southernmost population 540 differentiation is warmer water (Supplementary Table 1a), causing separation in spawning 541 time and thus temporal isolation restricting gene-flow. 542

Baltic Sea and the transition zone Skagerrak and Kattegat 543
We confirm the previous observations [43,45] of high genetic divergence of the Baltic Sea 544 lumpfish and high similarity between individuals within the Baltic (Figure 4; Supplementary 545 Table 4a) and lower genome-wide variability (Supplementary Table 3). This is likely a 546 combination of genetic drift due to limited number of founders and isolation, as well as 547 strong diverging selection pressures in this marginal brackish environment, as has been 548 shown for other species [17,94,95]. In fact, we show that variation in salinity together with 549 temperature range, as well as minimum temperature with dissolved oxygen level, were 550 significant factors explaining the genetic population structure of lumpfish. This was true even 551 when the geographic variation was accounted for (Supplementary Figure 5), indicative of 552 local adaptation in Baltic lumpfish. Nonetheless, one individual of Baltic Sea genetic origin 553 was found in the Skagerrak off-shore sample, despite the large difference in salinity between 554 the two areas, suggesting that adult Baltic Sea lumpfish can survive in a much more saline 555 environment than they inhabit. 556 The salinity transition zone between the North Sea and Baltic Sea showed surprisingly high 557 levels of structuring. While the Skagerrak area, adjacent to the North Sea, was clearly 558 clustered with the larger East Atlantic group (see below), the Kattegat formed its own genetic 559 cluster. Individuals from Orust in the northernmost parts of Kattegat showed some degree of 560 assignment to the Skagerrak cluster. However, all other samples in Kattegat, including 561 samples next to the entrance of the Baltic Sea, were highly differentiated from both the Baltic 562 Sea and Skagerrak (Figure 4) where all individuals were clearly assigned to one genetic group or another. We had detailed 603 catch information of all fish from one of the fjords, Boknafjord (data not shown). 604 Examination of the genetic clusters did not reveal any relationship with the status of the fish 605 (juvenile vs adult) nor sex but when we compared the genetic groups with the geographic 606 location of each fish, spatial division was evident. Within this large fjord, ~100 km in length 607 and covering an area of 1579 km 2 , fish that were caught closer to the open sea clustered with 608 the large East Atlantic group, whereas the group found mainly deeper inside the fjord formed 609 another, distinct regional group (Supplementary Figure 7). This is likely the same genetic 610 group as Whittaker et al. [45] reported, who also found a distinct genetic group from this 611 same region. 612 It is unclear what the regionally diverged coastal group in the southwestern fjords represents, 613 but there are some possibilities: First, these fish could represent local lumpfish ecotypes. 614 samples used in the initial 2b-RAD-analysis (see Table 1) and in the SNP selection phase 638 were determined based on their location and previous knowledge of the genetic population 639 structure for this species. This selection process has a few potential pitfalls: Studies 640 conducted with a limited representation of the species' genome and based on few individuals 641 can be prone to biases. The same applies to all later sampling stages if the samples used 642 deviate from their source population (see e.g. [117-119]). Because genetic diversity is 643 unequally distributed across populations, the populations in which SNPs are discovered may 644 contribute to ascertainment bias. We tried to minimize known sources of biases in the marker 645 development phase by including several sampling locations, using fish of both sexes and 646 selecting markers with a wide genomic coverage. Whenever possible, samples from each site 647 were collected at multiple time points and from a wider area. As RADseq is a non-selective 648 process in relation to which genomic regions are to be included, the final genomic 649 representation is random. Thus, population processes mediated by neutral genetic markers, 650 such as demographic history, can reliably be studied as they usually affect the entire genome. 651 However, selection-related processes can be missed since selection often affects relatively 652 small and targeted parts of the genome, many of which may not be included in random 653 RADseqs [120]. Therefore, this study will tell only a partial and likely somewhat biased story 654 of the lumpfish population structure, and more comprehensive studiesgeographically 655 and/or genome-wisecould provide with other pieces of evidence. 656 However, outlier analysis remains a powerful approach to study larger patterns of 657 differentiation. In this study, we found that the targeted dataset performed equally well or 658 even better than the genome-wide dataset at separating lumpfish populations regionally. This 659 is despite the fact that only a small portion (3-17 SNPs; Supplementary Table 2c) of the 660 selected 139 SNPs were informative within each region. The targeted dataset displayed ~2 to 661 5 times higher level of divergence (Table 2), and greater clarity than the genome-wide dataset 662 ( Figure 3). In addition to the previously described population structure, our analyses revealed 663 further genetic structure. Furthermore, our analyses revealed evidence of likely migrants and 664 hybrids on both sides of the Atlantic. 665

Conclusions and management implications 666
Lumpfish occupy vast and highly variable marine environments and have high migration